Microblog Processing: A Study

نویسنده

  • Sandip Modha
چکیده

Sensing Microblog from retrieval and summarization become the challenging area for the Information retrieval community. Twitter is one of the most popular micro blogging platforms. In this paper, Twitter posts called tweets are studied from retrieval and extractive summarization perspectives. Given a set of topics or interest profiles or information requirement, a Microblog summarization system is desinged which process Twitter sample status stream and generate day-wise, topic-wise tweet summary. Since volume of the Twitter public status stream is very large, tweet filtering or relevant tweet retrieval is the primary task for the summarization system. To measure the relevance between tweets and interest profiles, Language model with Jelinek-mercer smoothing, Dirichlet smoothing and Okapi BM25 model are used. Behaviour of Language Model smoothing parameter λ for JM-smoothing and μ for dirichlet smoothing is also studied. Summarization is anticipated as clustering problem. TREC MB 2015 and TREC RTS 2016 dataset is used to perform experiment. TREC RTS official metrics nDCG@10 − 1 and nDCG@10 − 0 are used to evaluate outcome of experiment. A detailed post hoc analysis is also performed on experiment results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text

Twitter is the largest source of microblog text, responsible for gigabytes of human discourse every day. Processing microblog text is difficult: the genre is noisy, documents have little context, and utterances are very short. As such, conventional NLP tools fail when faced with tweets and other microblog text. We present TwitIE, an open-source NLP pipeline customised to microblog text at every...

متن کامل

Soochow University Word Segmenter for SIGHAN 2012 Bakeoff

This paper presents a Chinese Word Segmentation system on MicroBlog corpora for the CIPS-SIGHAN Word Segmentation Bakeoff 2012. Our system employs Conditional Random Fields (CRF) as the segmentation model. To make our model more adaptive to MicroBlog, we manually analyze and annotate many MicroBlog messages. After manually checking and analyzing the MicroBlog text, we propose several pre-proces...

متن کامل

An Empirical Study on Chinese Microblog Stance Detection Using Supervised and Semi-supervised Machine Learning Methods

Nowadays, more and more people are willing to express their opinions and attitudes in the microblog platform. Stance detection refers to the task that judging whether the author of the text is in favor of or against the given target. Most of the existing literature are for the debates or online conversations, which have adequate context for inferring the authors’ stances. However, for detecting...

متن کامل

Thread Cleaning and Merging for Microblog Topic Detection

As a classic natural language processing technology, topic detection recently attracts more research interests due largely to the rapid development of microblog. The most challenging issue in microblog topic detection is sparse data problem. In this paper, the temporal-author-topic (TAT) model is designed to accomplish microblog topic detection in two phases. In the first phase, the TAT model i...

متن کامل

CLIP at TREC 2015: Microblog and LiveQA

The Computational Linguistics and Information Processing lab at the University of Maryland participated in two TREC tracks this year. The Microblog Real-Time Filtering and the LiveQA tasks both involve information processing in real time. We submitted nine runs in total, achieving relatively good results. This paper describes the architecture and configuration of the systems behind those runs.

متن کامل

Microblog Track 2011 of FDU

Twitter provides huge amount of short messages, raises challenge problems to the research community. The Microblog Track of TREC detects the special behavior of the twitter dataset in the “real-time” retrieval task. This paper reports our participation in the Microblog Track task. Given the query topics, each participants are required to conduct a “real-time” retrieval task, which seeks for the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017